<!-- $Id: package.html,v 1.2 2001/05/17 19:00:55 dfs Exp $ -->
<body>
This package used to be the OROMatcher library and provides both
generic regular expression interfaces and Perl5 regular expression
compatible implementation classes.

<p>
<em>Note: The following information will be moved into the user's guide.</em>
</p>

<h1> Perl5 regular expressions </h1>
</a>
<p>
Here we summarize the syntax of Perl5.003 regular expressions, all of
which is supported by the Perl5 classes in this package. However, for
a definitive reference, you should consult the 
<a href="http://www.perl.org/CPAN/doc/manual/html/pod/perlre.html">
<code>perlre</code> man page </a>
that accompanies the Perl5 distribution and also the book
<em> Programming Perl, 2nd Edition </em> from O'Reilly & Associates.
We are working toward implementing the features added after Perl5.003
up to and including Perl 5.6.  Please remember, we only guarantee
support for Perl5.003 expressions in version 2.0.

<p>
<ul>
<li> Alternatives separated by |
<li> Quantified atoms
 <dl compact>
      <dt> {n,m} <dd> Match at least n but not more than m times.
      <dt> {n,}  <dd> Match at least n times.
      <dt> {n}   <dd> Match exactly n times.  
      <dt> *     <dd> Match 0 or more times.
      <dt> +     <dd> Match 1 or more times.
      <dt> ?     <dd> Match 0 or 1 times.
 </dl>
 <li> Atoms
 <ul>
     <li> regular expression within parentheses
     <li> a . matches everything except \n
     <li> a ^ is a null token matching the beginning of a string or line
          (i.e., the position right after a newline or right before
           the beginning of a string)
     <li> a $ is a null token matching the end of a string or line
          (i.e., the position right before a newline or right after
           the end of a string)
     <li> Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
     <ul>
         <li> Special backslashed characters work within a character
              class (except for backreferences and boundaries).  
         <li> \b is backspace inside a character class
     </ul>
     <li> Special backslashed characters
     <dl compact>
         <dt> \b <dd> null token matching a word boundary (\w on one side
                      and \W on the other)
         <dt> \B <dd> null token matching a boundary that isn't a
                      word boundary
         <dt> \A <dd> Match only at beginning of string
         <dt> \Z <dd> Match only at end of string (or before newline
                      at the end)
         <dt> \n <dd> newline
         <dt> \r <dd> carriage return
         <dt> \t <dd> tab
         <dt> \f <dd> formfeed
         <dt> \d <dd> digit [0-9]
         <dt> \D <dd> non-digit [^0-9]
         <dt> \w <dd> word character [0-9a-z_A-Z]
         <dt> \W <dd> a non-word character [^0-9a-z_A-Z]
         <dt> \s <dd> a whitespace character [ \t\n\r\f]
         <dt> \S <dd> a non-whitespace character [^ \t\n\r\f]
         <dt> \xnn <dd> hexadecimal representation of character
         <dt> \cD <dd> matches the corresponding control character
         <dt> \nn or \nnn <dd> octal representation of character
                               unless a backreference.  a 
         <dt> \1, \2, \3, etc. <dd> match whatever the first, second,
          third, etc. parenthesized group matched.  This is called a
          backreference.  If there is no corresponding group, the
          number is interpreted as an octal representation of a character.
         <dt> \0 <dd> matches null character
         <dt> Any other backslashed character matches itself
     </dl>
 </ul>
 <li> Expressions within parentheses are matched as subpattern groups
      and saved for use by certain methods.
 </ul>

<p>
By default, a quantified subpattern is <em> greedy </em>.
In other words it matches as many times as possible without causing
the rest of the pattern not to match. To change the quantifiers
to match the minimum number of times possible, without
causing the rest of the pattern not to match, you may use
a "?" right after the quantifier.

<dl compact>
<dt> *?     <dd> Match 0 or more times
<dt> +?     <dd> Match 1 or more times
<dt> ??     <dd> Match 0 or 1 time
<dt> {n}?   <dd> Match exactly n times
<dt> {n,}?  <dd> Match at least n times
<dt> {n,m}? <dd> Match at least n but not more than m times
</dl>

<p>
<b> Perl5 extended regular expressions </b> are fully supported.

<dl compact>
<dt> (?#text) <dd> An embedded comment causing text to be ignored.
<dt> (?:regexp) <dd> Groups things like "()" but doesn't cause the
 group match to be saved.
<dt> (?=regexp) <dd>
                 A zero-width positive lookahead assertion.  For
                 example, \w+(?=\s) matches a word followed by
                 whitespace, without including whitespace in the
                 MatchResult.

<dt> (?!regexp) <dd>
                 A zero-width negative lookahead assertion.  For
                 example foo(?!bar) matches any occurrence of
                 "foo" that isn't followed by "bar".  Remember
                 that this is a zero-width assertion, which means
                 that a(?!b)d will match ad because a is followed
                 by a character that is not b (the d) and a d
                 follows the zero-width assertion.


<dt> (?imsx) <dd> One or more embedded pattern-match modifiers.
                i enables case insensitivity, m enables multiline
                treatment of the input, s enables single line treatment
                of the input, and x enables extended whitespace comments.
</ul>


</body>

